Adapter Generation for Extracting and Querying Data from Web
نویسندگان
چکیده
Accessing and integrating data from heterogeneous sources has become a significant challenge. So-called adapters provide the functionality for translating SQL queries into queries understandable by the source as well as converting the results into a common model. In this paper, we present our approach of an adapter for Web sources, which is configurable by specifying a sourcespecific extraction function. We focus on two main tasks: query modification in order to extend the source capabilities and data extraction. The extraction step bases on an operational description, that enables an interactive exploration of the result format during the development phase. Finally, we present our ideas for semi-automatic discovery of extraction patterns by analyzing example documents.
منابع مشابه
Adapter Generation for Extracting and Querying Data from Web Sources
Accessing and integrating data from heterogeneous sources has become a significant challenge. So-called adapters provide the functionality for translating SQL queries into queries understandable by the source as well as converting the results into a common model. In this paper, we present our approach of an adapter for Web sources, which is configurable by specifying a sourcespecific extraction...
متن کاملDeveloping a BIM-based Spatial Ontology for Semantic Querying of 3D Property Information
With the growing dominance of complex and multi-level urban structures, current cadastral systems, which are often developed based on 2D representations, are not capable of providing unambiguous spatial information about urban properties. Therefore, the concept of 3D cadastre is proposed to support 3D digital representation of land and properties and facilitate the communication of legal owners...
متن کاملAn XML-enabled data extraction toolkit for web sources
The amount of useful semi-structured data on the web continues to grow at a stunning pace. Often interesting web data are not in database systems but in HTML pages, XML pages, or text files. Data in these formats are not directly usable by standard SQL-like query processing engines that support sophisticated querying and reporting beyond keyword-based retrieval. Hence, the web users or applicat...
متن کاملPresenting a method for extracting structured domain-dependent information from Farsi Web pages
Extracting structured information about entities from web texts is an important task in web mining, natural language processing, and information extraction. Information extraction is useful in many applications including search engines, question-answering systems, recommender systems, machine translation, etc. An information extraction system aims to identify the entities from the text and extr...
متن کاملA Framework for Improved Access to Museum Databases in the Semantic Web
Digital museum databases have extremely heterogeneous data structures which require advanced mapping and vocabulary integration for them to benefit from the interoperability enabled by semantic technologies. In addition to establishing ways of extracting and manipulating digitally encoded cultural material, there exists a need to make this material available and accessible to human users in dif...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1999